Generalization, Overfitting, Aic

نویسنده

  • Yashar Ahmadian
چکیده

First some context. In supervised learning in general, the goal is to learn or infer an (initially) unknown function f : x 7→ y, from a set of training data in the form of T “input/output” pairs {(xμ, yμ)}μ=1:T . 1 More generally, you try to infer the conditional distribution ρ(y|x) from this training set; the reason is that in general your outputs contains some noise (or stated better, trial-to-trial variability, not eliminated by controlling x), and therefore the y’s are not given by a deterministic (and smooth) function of x. The ρ(y|x) thus captures/formalizes the “data generating process,” and I will call it that. One simple special case is that of additive noise, where

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Penalty Functions for Genetic Programming Algorithms

Very often symbolic regression, as addressed in Genetic Programming (GP), is equivalent to approximate interpolation. This means that, in general, GP algorithms try to fit the sample as better as possible but no notion of generalization error is considered. As a consequence, overfitting, code-bloat and noisy data are problems which are not satisfactorily solved under this approach. Motivated by...

متن کامل

Model complexity control for hydrologic prediction

[1] A common concern in hydrologic modeling is overparameterization of complex models given limited and noisy data. This leads to problems of parameter nonuniqueness and equifinality, which may negatively affect prediction uncertainties. A systematic way of controlling model complexity is therefore needed. We compare three model complexity control methods for hydrologic prediction, namely, cros...

متن کامل

Logistic Model Tree With Modified AIC

Logistic Model Trees have been shown to be very accurate and compact classifiers. Their greatest disadvantage is the computational complexity of inducing the logistic regression models in the tree. This issue is addressed by using the modified AIC criterion instead of crossvalidation to prevent overfitting these models. In addition, to fill the missing values, mean and mode are used class wise ...

متن کامل

An Approach to Reducing Overfitting in FCM with Evolutionary Optimization

Fuzzy clustering methods are conveniently employed in constructing a fuzzy model of a system, but they need to tune some parameters. In this research, FCM is chosen for fuzzy clustering. Parameters such as the number of clusters and the value of fuzzifier significantly influence the extent of generalization of the fuzzy model. These two parameters require tuning to reduce the overfitting in the...

متن کامل

Bias of the corrected AIC criterion for underfitted regression and time series models

The Akaike Information Criterion, AIC (Akaike, 1973), and a bias-corrected version, Aicc (Sugiura, 1978; Hurvich & Tsai, 1989) are two methods for selection of regression and autoregressive models. Both criteria may be viewed as estimators of the expected Kullback-Leibler information. The bias of AIC and AICC is studied in the underfitting case, where none of the candidate models includes the t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014